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Abstract —In this article we provide a comprehensive review of 
the different evolutionary algorithm techniques used to address 
multimodal optimization problems, classifying them according 
to the nature of their approach. On the one hand there are 
algorithms that address the issue of the early convergence to a 
local optimum by differentiating the individuals of the population 
into groups and limiting their interaction, hence having each 
group evolve with a high degree of independence. On the other 
hand other approaches are based on directly addressing the lack 
of genetic diversity of the population by introducing elements 
into the evolutionary dynamics that promote new niches of the 
genotypical space to be explored. Finally, we study multi-objective 
optimization genetic algorithms, that handle the situations where 
multiple criteria have to be satisfied with no penalty for any of 
them. Very rich literature has arised over the years on these 
topics, and we aim at offering an overview of the most important 
techniques of each branch of the field. 

I. Introduction 

Genetic Algorithms aim at exploring the genotypical space 
to find an individual whose associated phenotype optimizes a 
prefefined fitness function. When the fitness function presents 
multiple local optima, the problem is said to be multimodal 
optimization. 


The desired behaviour of a GA when applied to such a 
type of problem is not to get stuck in local optima, but find 
the global optimum of the function. This very same concept 
was introduced in the context of biological evolution by Sewall 
Wright in Fitness Landscapes in 1932 in (T): 

In a rugged field of this character, selection will 
easily carry the species to the nearest peak, but there 
may be innumerable other peaks which are higher 
but which are separated by valleys. The problem 
of evolution as I see it it is that of a mechanism 
by which the species may continually find its way 
from lower to higher peaks in such a field. In order 
for this to occur, there must be some trial and error 
mechanism on a grand scale by which the species 
may explore the region surrounding the small portion 
of the field which it occupies 


optimization tries to find the optimum of a single fitness 
function that has multiple local optima, multi-objective 
optimization tries to find the balance at optimizing several 
fitness functions at the same time. Nevertheless, they 
have many commonalities, and concepts from each world 
can be applied to the other. For instance, current multi¬ 
objective genetic algorithms tend to make use of multimodal 
optimization mechanisms in one of their steps, normally 
crowding or fitness sharing(see section 11 - B1 [ >, and also, recent 
reinterpretations of multi-objective genetic algorithms allow 
to explicitly hadle multimodal single-objective optimization 
problems. 


In section |Tl we explore the different groups of techniques, 
covering first the approaches that structures the population 
so the the individuals are assigned to subpopulations with 
limited cross-interation (section 11 - A [> and then the algorithms 


that condition the evolutionary dynamics to promote diversity, 
aiming at avoiding early convergence to a local optimum. We 
also explore multi-objective optimization algorithms ( |lTCl >. 
focusing on their aspects related to multi-modality. After the 
glance at the available techniques, section [Til] provide a brief 
discussion their pros and cons. Finally, our conclusions are 


presented in section IV 


II. State of the art 

Throughout this section we explore the different 
genetic algorithm branches devoted to multimodal problem 
optimization, describing the most remarkable representatives 
of each of the approaches. 


It should be noted that in the literature, many of the algo¬ 
rithms described in the following sections are labeled under the 
term niching methods. The uses of such a wording are diverse, 
normally comprising GAs based on the spatial distribution of 
the population (section |II-A2[ ) and some GAs based on the 
explicit control of the diversity within the population ( |II-B| ). 
However, due to the broad use of the word, we shall not use 
it, in order to avoid misunderstandings. 


This type of scenarios require the GA to develop strategies 
to cover all the genotypic space without converging to a local 
optimum. Throughout this article we explore the different 
techniques applied to genetic algorithms to improve their 
effectiveness in multimodal fitness problems. 

Multimodal optimization problems should not be 
confused with multiobjective optimization: while multimodal 


A. Structured Population GAs (implicit diversity promotion) 

Structured Population Genetic Algorithms do not explicitly 
measure and enforce diversity, but impose certain constraints 
to the population aiming at regulate its dynamics. The two 
subgroups among this type of algorithms are those explicitly 
partitioning the population and those inducing measures that 
lead individuals to cluster into subpopulations. Both families 
are studied in the following subsections. 
















1) Algorithms based on the Spatial Segregation of One 
Population: Evolutionary Algotihms relying on spacial 
segregation usually divide the population into completely 
segregated groups that at certain points in time retrofit their 
genetic material. This aims at avoiding the homogenicity of 
panmitic approaches by keeping several homogenous groups 
and mixing them at controlled intervals, hence profitting from 
the good local optima found by each subgroup. 

• Island Model GAs ( aka Parallel GAs, Coarse-grained 
GAs) manage subpopulations (islands, demes), each 
one evoling separately with its own dynamics (e.g. 
mutation rate, population size), but at certain points 
they exchange some individuals (i.e. the new genetic 
operator: migration). The algorithm can be subject of 
different design decissions (0): number of islands 
and migration topology among them (individuals 
from which islands can migrate to which islands), syn¬ 
chronism of the migration (asynchronous scales better 
and profits from underlying hardware parallelism, but 
is non-deterministic, non-repeatable and difficult to 
debug), migration frequency, whether the migration 
is initiated by the source or by the destination and 
the migration selection and migration replacement 
policies. 

• Spatially-Dispersed GAs (13) associate a two dimen¬ 
sional coordinate to every individual (initial positions 
are assigned at random), having offspring placed ran¬ 
domly but close to the first of the parents. Mating is 
only allowed with individuals that are located within 
certain visibility radius. These dynamics lead to the 
progressive spread and accumulation of the origi¬ 
nally randomly distributed individuals into clusters 
(demes) that resemble subpopulations. The value of 
the aforementioned visibility radius is not important, 
as the populations spread according to its scale without 
performance penalties. 

2) Algorithms based on Spatial Distribution of One Pop¬ 
ulation: Genetic Algorithms belonging to this family do not 
impose hard divisions among groups of individuals, but induce 
their clustering by means of constraints in their evolutive 
dynamics. Their most remarkable approaches are: 

• Diffusion Model (HI, 0) keeps two subpopula¬ 
tions, said to be of different species. The individuals 
from both populations are spread over the same two- 
dimensional toroidal grid, ensuring each cell contains 
only one member from each population. Mating is 
restricted to individuals from the same species within 
the neighbouring cells with a fitness-proporctionate 
scheme. Replacement follows the opposite approach 
from mating, that is, offspring probabilistically re¬ 
places their parents in the neighbourhood. Both popu¬ 
lations compete to be the fittest, hence they co-evolve 
but do not mix with the other species. 

• Cellular GAs (cGA) (151) is the name of the family 
of GAs evolved from the Diffusion Model. They also 
adjust the selection pressure and have the concept 
of neighbourhoods, but improve the base idea on 


several different directions. Some of the remarkable 
contributions are: 

o Terrain-Based Genetic Algorithms (TBGA) 
(171, ||8l ) is a self-tuning version of cGA, 
where each grid cell of the two-dimensional 
world is assigned a different combination of 
parameters. They then evolve separately, each 
cell mating with their up, down, left and right 
neighbours. This algorithm can be used not 
only to address the optimization problem itself, 
but to find a set of suitable parameter values to 
be used in a normal cGA. In fact, the authors 
admit that a normal cGA using the parameters 
found by their TBGA performs better than the 
TBGA itself. 

o Genetic and Artificial Life Environment 

(GALE) (10) offers the concept of empty cells, 
where neighbouring offspring are placed. If 
no empty cells are present after breeding a 
cell, new individuals replace worst performing 
individuals from their original neighbourhood. 
This algorithm also presents fitness sharing 
(see section |Tl-Bl| ). 

o Co-evolutionary approaches like in ITOl . an 
improvement over sorting networks , where 
two species (referred to as hosts-parasites or 
prey/predators). Hosts are meant to sort some 
input data, while parasites represent test data 
to be supplied to a host as input. The fitness 
of each group is opposed to the other group: 
the fitness of the hosts depends on how many 
test cases (i.e. parasites) an individual has 
succeeded in sorting, while the fitness of the 
parasites depends on how many times it made 
a host fail sorting. 

o Multi-objective variations of cGA, namely 
cMOGA and MOCell, which are addressed in 
section um 

3) Algorithms imposing other mating restrictions: These 
algorithms impose mating restrictions based on other criteria, 
normally mimicking the high level dynamics of existing real- 
world environments. The most remarkable ones present in the 
literature are: 

• Multinational Evolutionary Algorithms ( ifTTl ): divide 
the world into nations and partition the population 
among them, also having different roles within each 
nation, namely politicians, police and normal people. 
Their interaction and mating dynamics are defined by 
pre-established social rules. 

• Religion-Based Evolutionary Algorithms (El): as¬ 
signs each individual to a different religion and defines 
genetic operators for converting between religions. 
Mating is hence restricted to individuals with the same 
beliefs. 

• Age Structure GAs ( 1131 ) define the lifecycle of indi¬ 
viduals and constrain the mating to individuals in the 
same age group. 






B. Diversity Enforcing Techniques 

The main trait of this group of algorithms is that they 
define a measure of the population diversity distribution over 
the genotypical space and act upon local accumulaions of 
individuals, favouring heir migration to new niches. 

1) Fitness sharing: Fitness Sharing GAs (HU) are based 
on having individual’s fitness points shared with their neigh¬ 
bours. The neighbourhood is defined as the individuals within 
certain radius a s hare over a established distance metric (e.g. 
euclidean distance, Hamming distance). This way, the new 
fitness F' of an individual i is calculated based on its distance 
d to every neighbour j as: 


F'(i) 


_ m _ 

sharingfunction(d(i, j)) 


( 1 ) 


where sharing function receives as input the distance 
between two individuals and is computed as: 


sharingfunction(d) = { l W°'hare) a ■ ifd < <J share 
* J ' 1 U : otherwise 

( 2 ) 

Having a define the shape of the sharing function (i.e. 
a = 1 for lineal sharing). 


This way, the convergente to a single area of the fitness 
landscape is discouraged by pretending there are limited re¬ 
sources there. The more individuals try to move in, the more 
neighbours the fitness have to be shared with. Hence, for 
individuals in crowded areas, eventually another region of the 
fitness space becomes more attractive. Ideally, the algorithm 
stabilizes at a point where an appropriate representation of 
each niche is maintained. 

2) Clearing: Clearing GAs ( 133 ) divide the population 
in subpopulations according to a dissimilarity measure (e.g. 
Hamming distance). For each subpopulation, in the selection 
phase the fittest individual is considered the winner (nor¬ 
mally referred to as the dominant individual). Then the other 
members of the subpopulation have their dissimilarity to the 
winner calculated. If such the distance of an individual of the 
subpopulation to its winner is greater than certain threshold 
(the clearing radius ), it gets its fitness set to zero (i.e. it gets 
cleared). After the whole population has been processed, the 
subpopulations are recalculated again based on the very same 
clearing radius. 

3) Crowding: Crowding GAs associate to every individual 
breeded in the current generation with another individual from 
the parent generation (pairing phase) and only keep one of the 
two in the population (replacement phase). The association 
is established based on genotypical similarity criteria (e.g. 
Manhattan distance. Euclidean distance). This approach favors 
the growth of individuals around underpopulated regions of 
the solution space and penalizes overcrowded areas because 
only the similar individuals get replaced. 


specify the size of the sample of individuals initially selected at 
random as candidates to be replaced by a particular offspring, 
among which only one shall finally be chosen based on 
fitness. He found problems in the original formulation of 
the algorithm, as it failed to prevent genetic drift in many cases. 


Mahfoud improved on the algorithm by identifying several 
weak points, most remarkably focusing on maintaining global 
diversity ( |[T7l . 1181 . |fl9l ) and addressing them by introducing 
a different diversity measure that favoured niching, namely 
the number of peaks maintained by the population. With 
the new measures, Mahfoud re-evaluated De Jong’s mislead 
conclusions (i.e. that CF higher than 1 led to genetic drift) 
and reformulated the algorithm to only use the individual’s 
parents as candidates for replacement (hence reducing 
drastically the computational complexity). This variation is 
called Deterministic Crowding. 


Mengshoel dl20ll ) proposed a variation called Probabilistic 
Crowding in which the selection criteria for individuals to be 
replaced is not fitness-proportionate but random, hence favour¬ 
ing the conservation of low-fitness individuals and avoiding 
genetic drift toward the high-fitness niche. 


C. Multi-objective evolutionary algorithms 

Multi-Objective Optimization problems are characterized 
by the need to find proper trade-offs among different criteria, 
each of them quantified by means of an objective function, 
formally ( l2ll ): 

A general Multi-objective Optimization Prob¬ 
lem (MOP) is defined as minimizing (or maximiz¬ 
ing) F{x) = subject to g t (x) < 0, 

i = 1 ,...,m, and hj{x) = 0, j = An MOP 

solution minimizes (or maximizes) the components 
of a vector F(x) where £ is a n-dimensional decision 
variable vector x = (xi, ...,x n ) from some universe 
0. It is noted that gi(x) < 0 and hj(x) = 0 represent 
constraints that must be fulfilled while minimizing 
(or maximizing) F(x) and Cl contains all possible x 
that can be used to satisfy an evaluation of F(x). 

The evaluation function, F : Cl ^ A, maps from the 
decision variable space x = {x\ ,...,x ra ) to the objective 
function space y = f(x ),..., f k {x)) f] 


The cathegorization Multi-objective optimization normally 
refers to approaches that are defined in terms of Pareto 
Optimality ( li2li ): 

A solution x £ Cl is said to be Pareto Optimal 
with respect to Cl if and only if there is no x <£ V. 
for which v = F(x) = (fi(x),..., fk{x)) dominates 
u = F(x) = (/i(x),..., f k {x)). 

Where a vector u s said to dominate another vector 
v if there is a subset of fi{x) for which u is (assuming 


In the original algorithm formulation, De Jong used a 
Crowding Factor parameter (CF) (section 4.7 of lfl6l ) to 


1 In this equation we use the x notation to clarify the vectorial nature of the 
parameter of /;, but we will not use it in the rest of the report 




minimization) partially less than v, that is 3i : m < v,. 


than Island Model GAs. 


This means that x* is Pareto optimal if there exists no 
vector which would decrease some criterion without causing 
a simultaneous increase in at least one other criterion (again 
assuming minimization). 


When we plot all the objective function vectors that are 
nondominated, the obtained curve is usually referred to as the 

Pareto front. 

A Multi-objective Optimization Evolutionary 
Algorithm (MOEA) consists of the application of GAs 
to a MOE. The mechanism of a MOEA is the same as a 
normal GA. Their only difference is that, instead of a single 
fitness function, MOEAs compute k fitness function and then 
perform on them a transformation in order to obtain a single 
measure, which is then used as the fitness in normal GAs. At 
each generation MOEAs output is the current set of Pareto 
optimal solution (i.e. the Pareto front) |^] 


There exist multiple variations of MOEAs, each of 
them differring in either the way they combine the individual 
objective functions into the fitness value ( a priori techniques)or 
the post-processing they do to ensure Pareto optimality. 


Deb recently proposed a MOEA ({22], |231 ) for addressing 
single-objective multimodal optimization problems. This algo¬ 
rithm defined a suitable second objective and added it to the 
originally single objective multimodal optimization problem, 
so that the multiple solutions form a pareto-optimal front. This 
way, the single-objective multimodal optimization problem 
turned artificially into a MOP can be solved for its multiple 
solutions using the MOEA. 

III. Discussion 

One of the most attractive traits of spatial segregation 
GAs is that each of the population subgroups can be evolved 
in parallel, hence making them suitable to profit from parallel 
architectures such as multicore or supercomputing facilities. 


Another potentially attractive chracteristic of this type 
of algorithms is that they are to some degree independent 
on the optimization algorithm. This enables to use different 
optimization algorithms (not constraining to GAs) to each 
island (]2|). 


However, a significant concern about them is that they 
need to be carefully tuned in order to perform well. For 
instance. Fitness sharing (section II-Bll needs to manually 
set the niche radius, and the algorithm is quite sensitive to 
this choice. Failure to properly tune the algorithm parameters 
normally implies performing significantly under that of an 
equivalent panmitic implementation ({3]). In this regard, 
Spatially-Dispersed GAs require less configuration tuning 


2 Some MOEAs make use of a secondary population acting as an archive, 
where they store all the nondominated solutions found through the generations 


Some self-adapting options are interesting in that they do 
not need such configuration tuning at all. However, many times 
these algorithms perform worse than their equivalent fine tuned 
non-adaptive version ({7|, ll8l). 

One of the most significant problems among the reviewed 
techniques is the one suffered by those that rely on structuring 
the population (section |II-A| ), which cannot assure an 
improvement on the solution space covered because they 
are not based on a measure of the distribution of the 
diversity (l24i). This way, these algorithms offer good 
performance for some problems, but worse performance than 
panmitic approaches, with no apparent reason. Moreover, 
given their loose relation to diversity control, it is often 
impossible to diagnose or fix the root cause of the algorithm 
underperforming for certain problem. 


Most of the sources in the literature of the last years 
tend to agree that Crowding (section II-B3| l is effective for 
any multimodal optimization problem. Nevertheless, promising 
MOEAs (ED, J23j) may also play an important role in the 
upcoming years. 


IV. Conclusions 

In most real-world optimization problems, the fitness 
landscape is unknown to us. This means that there is a 
non-negligible chance that it is multimodal. Failing to 
acknowledge so -and act accordingly- may likely result in the 
optimization to converge too early to a local optimum. 


From the reviewed techniques, the only one with 
quorum among the scientific community regarding general 
effectiveness is Crowding. All other options have proven 
valuable for many concrete problems, but they certainly 
exhibit suboptimal performance compared to panmitic 
approaches for some other problems. 


This tells us that selecting the appropriate multimodal 
optimization genetic algorithm cannot be addressed a priori , 
but has to undergo a trial-error process, driven by the intuition 
of the researchers to choose an approach that has proven 
effective for seemingly analogous or similar problems. 
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